Development of Strategies for SNP Detection in RNA-Seq Data: Application to Lymphoblastoid Cell Lines and Evaluation Using 1000 Genomes Data

نویسندگان

  • Emma M. Quinn
  • Paul Cormican
  • Elaine M. Kenny
  • Matthew Hill
  • Richard Anney
  • Michael Gill
  • Aiden P. Corvin
  • Derek W. Morris
چکیده

Next-generation RNA sequencing (RNA-seq) maps and analyzes transcriptomes and generates data on sequence variation in expressed genes. There are few reported studies on analysis strategies to maximize the yield of quality RNA-seq SNP data. We evaluated the performance of different SNP-calling methods following alignment to both genome and transcriptome by applying them to RNA-seq data from a HapMap lymphoblastoid cell line sample and comparing results with sequence variation data from 1000 Genomes. We determined that the best method to achieve high specificity and sensitivity, and greatest number of SNP calls, is to remove duplicate sequence reads after alignment to the genome and to call SNPs using SAMtools. The accuracy of SNP calls is dependent on sequence coverage available. In terms of specificity, 89% of RNA-seq SNPs calls were true variants where coverage is >10X. In terms of sensitivity, at >10X coverage 92% of all expected SNPs in expressed exons could be detected. Overall, the results indicate that RNA-seq SNP data are a very useful by-product of sequence-based transcriptome analysis. If RNA-seq is applied to disease tissue samples and assuming that genes carrying mutations relevant to disease biology are being expressed, a very high proportion of these mutations can be detected.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

O-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis

Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

I-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing

Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...

متن کامل

3D Detection of Power-Transmission Lines in Point Clouds Using Random Forest Method

Inspection of power transmission lines using classic experts based methods suffers from disadvantages such as highel level of time and money consumption. Advent of UAVs and their application in aerial data gathering help to decrease the time and cost promenantly. The purpose of this research is to present an efficient automated method for inspection of power transmission lines based on point c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013